Scene parsing is challenging for unrestricted open vocabulary and diversescenes. In this paper, we exploit the capability of global context informationby different-region-based context aggregation through our pyramid poolingmodule together with the proposed pyramid scene parsing network (PSPNet). Ourglobal prior representation is effective to produce good quality results on thescene parsing task, while PSPNet provides a superior framework for pixel-levelprediction tasks. The proposed approach achieves state-of-the-art performanceon various datasets. It came first in ImageNet scene parsing challenge 2016,PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields newrecord of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% onCityscapes.
展开▼